\documentclass[]{article}
%\usepackage{hyperref}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{textcomp}
\usepackage{amsthm}
\usepackage{bigstrut}
\usepackage[margin=1in]{geometry}
\usepackage{natbib}
\usepackage{graphicx}
\usepackage{listings}
\usepackage{color}
\usepackage[usenames,dvipsnames]{xcolor}
\usepackage{cancel}

\definecolor{dkgreen}{rgb}{0,0.6,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
\definecolor{mauve}{rgb}{0.58,0,0.82}


 
\lstset{ %
  language=MATLAB,                % the language of the code
  basicstyle=\footnotesize,           % the size of the fonts that are used for the code
  numbers=left,                   % where to put the line-numbers
  numberstyle=\tiny\color{gray},  % the style that is used for the line-numbers
  stepnumber=4,                   % the step between two line-numbers. If it's 1, each line 
                                  % will be numbered
  numbersep=5pt,                  % how far the line-numbers are from the code
  backgroundcolor=\color{white},      % choose the background color. You must add \usepackage{color}
  showspaces=false,               % show spaces adding particular underscores
  showstringspaces=false,         % underline spaces within strings
  showtabs=false,                 % show tabs within strings adding particular underscores
  tabsize=2,                      % sets default tabsize to 2 spaces
  captionpos=b,                   % sets the caption-position to bottom
  breaklines=true,                % sets automatic line breaking
  breakatwhitespace=false,        % sets if automatic breaks should only happen at whitespace
  keywordstyle=\color{blue},          % keyword style
  commentstyle=\color{dkgreen},       % comment style
  stringstyle=\color{mauve},         % string literal style
  escapeinside={\%*}{*)},            % if you want to add a comment within your code
  morekeywords={*,...}               % if you want to add more keywords to the set
}

\newenvironment{reff}
{\color{\mycolor}}




\newcommand{\Ex}{\mathbb{E\/}}
\def\Var{\mathop{\rm Var}} 
\newcommand{\Pro}{\mathbb{P\/}}
\newcommand{\A}{\mathcal{A}}
\newcommand{\C}[1]{\mathcal{#1}}
\DeclareMathOperator*{\argmin}{arg\,min}
\newcommand{\mycolor}{blue}

\theoremstyle{definition}% default plain
\newtheorem{thm}{Theorem}
\newtheorem{prop}{Proposition}
\newtheorem{cor}{Corollary}
\renewcommand{\qedsymbol}{$\blacksquare$}
\newtheorem{lem}{Lemma}
\newtheorem*{notation}{Notation}

\theoremstyle{definition}
\newtheorem*{defn}{Definition}
\newtheorem{conj}{Conjecture}
\newtheorem{exmp}{Example}
\newtheorem{note}{Note}
\newtheorem{obs}{Observation}


%opening
\title{The Optimal Time to Initiate Arteriovenous Fistula Creation for Chronic Kidney\
Disease Patients}
\author{Reza Skandari\\
The University of British Columbia
}


\begin{document}
\bibliographystyle{plain}

\maketitle

\begin{abstract}

Most of patients with chronic kidney disease eventually end-up using dialysis as a substitute for their kidneys functions. For hemo-dialysis (HD), the most prevalent form of dialysis, patients need to have a vascular access. The gold standard for HD is through an arteriovenous fistula (AVF) access, which is created by means of a surgery. It takes several months until an AVF matures to a point it can support HD. By the time patients starts HD, if AVF is not mature, dialysis is conducted using an inferior quality temporary access called centravenous vascular catheter (CVC). Early and late creation of AVF comes with reduced quality of life, and other complications. In this paper, we investigate the question of timing of AVF creation with the objective of minimizing the reduced quality of life due to poor timing of AVF preparation. We model the problem as a Markov decision process, and make parallel analogy to a machine replacement problem. Under certain conditions, we find that the optimal policy is of the control limit threshold (CLT) type; i.e. we should create AVF as long as patient's health is worse than some threshold. We also found that with a (stochastically) larger maturation time, or higher penalty for lateness, we want to initiate AVF creation earlier. Through numerical computations, we found that the optimal policy is always of the CLT type under a more general disease progression assumption.
\end{abstract}
\section{Introduction and Motivation}
Approximately 23 million American adults have chronic kidney disease (CKD) \cite{NIH}, and  550,000 have end-stage kidney disease \cite{NKUDIC}.  The vast majority of these patients are treated with hemodialyisis (HD) \cite{NKUDIC}.  The preferred vascular access for HD is an arteriovenous fistula (AVF) \cite{guideline}; however, it may take several months and more than one procedure to establish an AVF that can be used for HD \cite{rayner2003creation}.  The downside of an AVF that is not mature by the time dialysis starts is the need for a central venous catheter (CVC), which is associated with an increased risk of complications such as sepsis, higher mortality rate, and reduced quality of life compared to HD with an AVF \cite{wasse2007association}.  On the other hand, creating an AVF also results in a small but increased risk of complications such as high output cardiac failure, and a reduction in quality of life.  Moreover, fistulas have limited lifetimes that may get used up prior to HD; thus it is undesirable for them to be created too early \cite{o2010whether}. Knowledge of the most appropriate time to begin AVF preparation is therefore critical, as it directly impacts patient outcomes.


In this research, we restrict ourselves to just one aspect of poor timing of AVF creation, and that is reduced quality of life because of early or late preparation of AVF. In other words, we want to minimize the deviation of the time AVF is ready from the time dialysis starts. In the following section, we model the problem as a Markov decision process, and draw analogies to a  similar problem in the machine replacement literature.
\section{Problem Definition} \label{definition}
Assume that the state of a system can be expressed by a single number in the set $\{F=0, 1,2,\ldots,H\}$, $H \le \infty$, where higher states correspond to a better state, and $F$ corresponds to the failure state. Failure is permanent, so the state $F$ is an absorbing state. When the system breaks down, i.e. when the state $F$ is reached, we need to use a certain substitutional system (henceforth called the substitute). The substitute has a stochastic ordering lead time. If the substitute is not arrived by the time system fails, a penalty is incurred which is dependant on the duration of time the substitute is \textbf{\textit{late}}. On the other hand, if the substitute arrives when the system has not failed yet, a penalty is incurred which is dependant on the duration of time the substitute is \textbf{\textit{early}}.

For the AVF creation decision problem, states may correspond to health state of the patient (for example the stage of the CKD, or any other break down of eGFR level), and failure state denotes the kidney failure. The substitute is dialysis via the AVF, and the lead time corresponds to the maturation time. Also, the penalties for earliness and lateness represent the reduced quality of life associated with an early or late prepared AVF, respectively.

Assume that the system fails at time $T_F$, and the substitute arrives at time $T_A$. We define $\Delta:=T_A-T_F$. The total penalty (for any realized value of $\Delta$, $\delta$) is defined as follows:
$$\mathcal{C}(\delta)=f_L([\delta]^+)+f_E([-\delta]^+) \hspace{25pt} \indent ([x]^+=\max\{x,0\}).$$
where $f_L$ and $f_E$ are positive, increasing functions, denoting cost of lateness and earliness, respectively. We refer to $\mathcal{C}$ as the penalty function.

In each period, the state of the system is observed, and the decision maker chooses to either order the substitute, or wait for another period. The substitute is ordered when we reach state $F$, if we have not ordered it yet. System condition transitions occur at the end of each period. The objective is to minimize the expected substitution cost, i.e. to minimize $\Ex [\mathcal{C}(\Delta)]$.

We formally describe the model through the following components:
\begin{itemize}
\item \textit{Decision Epochs} $t =\{1,2, \ldots\}$ corresponds to the discrete decision epochs in an infinite horizon problem.

\item\textit{States} $h \in \{\C{A}, F, 1, \ldots,1,H\}$. State $\A$ is an absorbing state denoting the state at which the substitute has been already ordered. When we order the substitute, we instantaneously transition to state $\A$.

\item\textit{Actions} We denote the action of waiting for one more period by $W$ and the action of ordering the substitute by $O$. Also note that in state $F$, we take action $O$.

\item\textit{State Transition Probabilities} If we choose the action $O$, we transition to $\A$ with probability one. Otherwise, if we take action $W$, we transition from state $h \in \{\A, 1,\ldots ,H\}$ to $h' \in \{\A,1,\ldots ,H\}$ according to a Markovian process. The probability of transition from $h$ to $h'$ is denoted by $P_{h,h'}$. We let $P$ represent the transition probability matrix under action $W$.

\item\textit{Costs} We assume that there is no immediate cost associated with action $W$. Let $M$ denote the (\textit{stochastic}) ordering lead time ($M$ for maturation) which is independent of the state $h$. Also let $T_h$ denote the (stochastic) time to reach state $F$ starting from state $h$. The immediate cost of action $O$ is then defined as 
$C(h):=\Ex [\mathcal{C}(M-T_h)]$. We refer to this cost as the substitution cost.

\item\textit{Value Function} Let $V(h)$ be the optimal expected total cost given the current
state is $h$. Value function is obtained as a solution to the Bellman's equation as follows:
\begin{align*}
&V^*(h)=\min \{ C(h), \sum\limits_{h'=F}^{H} P_{h,h'}V^*(h')\} \hspace{25pt} \forall h \in \{1,\ldots ,H\},\\
&V^*(F)=C(F).
\end{align*}
\end{itemize}
We assume that the failure state $F$ can be accessed from all other states of the system (not necessarily in a single step), and thus this problem can be classified as a stochastic shortest path problem, where we want to minimize the cost of reaching state $\A$. Although the transition probability was stated in a general sense, our most important results are proved only for random walk transitions.

This problem in the machine replacement literature belongs to the category of ordering and replacement of Markovian deteriorating systems. The closest work in the machine replacement literature to this problem is by Kawai \cite{kawai1983optimal}. Kawai \cite{kawai1983optimal} studied the ordering and replacement problem of a continuous time Markovian degrading system. They assumed the system has different replacement and operating cost for different states of the system, and the objective is to minimize the total cost of system consisting of the operating cost, replacement cost, and a possible holding cost of the spare part. They considered the average criterion for optimizing the infinite horizon problem. They found that under some conditions, the optimal policy is of ($N$,$n$) type, i.e. we should order the spare part if the state is equal to or worse than $n$, and replace the current system with the spare part if the state of the system is not any better than state $N$ (if it has been ordered and arrived). This work is different from Kawai \cite{kawai1983optimal} in the following aspects: 
\begin{itemize}
\item We consider the expected total cost as the optimality criterion. In the machine replacement context, considering the average criterion is appropriate, because we consider a system that renews each time its component is replaced, and as long as we have replacements, the system runs. But, in this patient's health problem, the replacement happens only once, and even after the replacement, the systems does not renew. Therefore, we have to consider the total cost as our criterion. Also, note that in this problem, although the horizon is actually finite with probability one (under rational choice of model parameters), we consider an infinite horizon because the horizon length is probabilistic; But for the machine replacement problem, the time horizon is infinite with probability one.

\item In an average cost problem, our course of action not only determines the current cycle's cost, but also determines (stochastically) the state at which we begin the next cycle(s). Thus, we may take some adverse actions (with a high cost for the current cycle) to balance the system, so that we start the future cycles in more favourable states. But we do not have such an option in the patient's health problem.
\end{itemize}
For a comprehensive literature review of machine replacement problems, we refer the reader to \cite{dohi1998optimal}, and \cite{wang2009condition}.

In this paper, we want to study structural properties of the optimal policy. More specifically, we want prove that under certain conditions, we have a monotone optimal policy, meaning that if system occupies state $h$ that is higher than a certain threshold, say $\hbar$, we take action $W$, otherwise we take action $O$. Also, we want to show how certain input parameters of the model change the optimal threshold. To achieve this goal, in section \ref{convexity}, we study the convexity of the substitution cost for random walk transitions. Convexity of the substitution cost is the key for our analysis in deriving structural properties for the optimal policy and value function in Section \ref{monotonicity}. Later in section \ref{comp}, we see that convexity of the substitution cost is a sufficient but not necessary condition for having a monotone optimal policy.

\section{Structure of substitution cost} \label{convexity}
In this section, we give necessary and sufficient conditions under which the substitution cost is convex. Recall that the substitution cost is defined as $C(h)=\Ex \C{C}[M-T_h]$, where $M$ is independent of state. Note that the substitution cost is dependant on state $h$, through $T_h$. Before proving convexity of the substitution cost, we show that the order of $T$s is conforming to the order of states (in stochastic sense).

\subsection{Stochastic order of time-to-absorption}
First, we give definition of the order of random variables as follows:
\begin{reff}

\begin{defn} [Stochastic order] We say random variable $X_1$ is stochastically less than or equal to random variable $X_2$, and denote that by $X_1 \preceq X_2$, if and only if $F_{X_2} \le F_{X_1}$, where $F_X$ is the CDF of the random variable $X$.
\end{defn}

\begin{defn} [Equality in distribution] We say random variable $X_1$ is equal in distribution to $X_2$, and denote that by $X_1 \sim X_2$, if and only if $F_{X_2} = F_{X_1}$.
\end{defn}
\end{reff}
\begin{notation}
Let $A$ denote a matrix. Vectors $a_i$, and $a^j$ denote the $i$th row and $j$th column of $A$, respectively. Also, $a^j_i$, denotes the $i$th row of the $j$th column of matrix $A$, or equivalently the $i,j$th component of matrix $A$. We also use $A_{i,j}$ to represent $a^j_i$.
\end{notation}

We assume increasing failure rate for the transition matrix $P$, defined in the following. The IFR property is a central assumption for most machine replacement problems, and is the key for our intuition about monotonicity in the optimal policy and/or value function.


\begin{defn} [IFR property]
Let $P$ be an $n$ by $n$ matrix. We say $P$ has increasing failure rate (IFR) property iff its rows are increasing in stochastic order sense. Define $d^i:=\sum\limits_{k=1}^{i} P^k$. Then IFR equivalently holds iff  $\forall i = 1,\ldots,n$, we have $d^i$ is a non-increasing vector, i.e. $d^i_1 \ge d^i_2 \ldots \ge d^i_n $.
\end{defn}
{\color{\mycolor}
\begin{prop}  [Alternative definition for stochastic order] \label{order}
Let $X_1$ and $X_2$ be two random variables. We have $X_1 \preceq X_2$ if and only $\Ex [f(X_1)] \le \Ex [f(X_2)]$ for any non-decreasing real valued function $f$.
\end{prop}
For proof, refer to reference \cite{shaked2007stochastic}.
}


\begin{lem} \label{AB}
Let $A$ and $B$ be two $n$ by $n$ matrices with IFR property. Then $AB$ has the IFR property.
\end{lem}


\begin{proof}
~\\
Decompose matrices $A$, and $B$, as follows: $A=[a_1,a_2,\ldots,a_n]$, $B=[b^1|b^2|\ldots|b^n]$. Then $AB$ is as follows: \\

$AB= \left[ \begin{matrix}
a_1b^1 & a_1 b^2 &\ldots &a_1 b^n \\
\vdots & \vdots  &\vdots &\vdots  \\
a_nb^1 & a_n b^2 &\ldots &a_n b^n \\
\end{matrix}\right]$.

Fix $k$. Define $d^k := \sum\limits_{i=1}^{k} [AB]^i$. We need to show that $d^k $ is a non-increasing vector ($i \le j: d_i^k \ge d_{j}^k$.). For $d^k$ we have:
$$d^k= \left[ \begin{matrix}
a_1(b^1+b^{2}+\ldots+b^k)\\
\vdots \\
a_n(b^1+b^{2}+\ldots+b^k)
\end{matrix}\right].$$
Define random variables $Y_i$, such that
 $\Pro(Y_i=j)=a_i^j$. Since $A$ is IFR, we have $Y_1 \preceq Y_2 \preceq \ldots \preceq Y_n$. Also, since $B$ is an IFR matrix, we have that $f(i):=[b^1+b^{2}+\ldots+b^k]_i$ is non-increasing in $i$. Note that $\Ex f(Y(i))=a_i[b^1+b^{2}+\ldots+b^k]=d^k_i$.
Since $-f$ is non-decreasing, by Proposition \ref{order}, we have that $i \le j: -\Ex f(Y(i)) \le -\Ex f(Y(j))$, which implies $i \le j: d_i^k \ge d_{j}^k$.
\end{proof}
\begin{notation}
Let $A$ be a square matrix. We denote the $n$th power of $A$ by $A^{(n)}$.
\end{notation}

\begin{lem} \label{fundam}
Let $P$, an $n$ by $n$ matrix, be a transition probability matrix. Assume that state $0$ is absorbing. Then $\Pro (T_i \le k)=P^{(k)}_{i,0}$ for $k \ge 1$.
\end{lem}
\begin{proof}
~\\
We prove the lemma by induction.
\\

\noindent $\bullet$ $k=1$:

If we start in state $0$, we never leave it. Therefore, $\Pro (T_0 \le k)=1$ for all $k$. Also, by assumption $P_{0,0}=1$, and thus $\Pro (T_0 \le k)=P_{0,0}$. 

If we start in state $i>0$, to reach state $0$, we need at least one transition. So, $\Pro (T_i =0)=0$, and as a result $\Pro (T_i \le 1)=\Pro (T_i = 1)=P_{i,0}$. Therefore, $\Pro (T_i \le 1)=P_{i,0}$.


\noindent $\bullet$ $k=n$: 

Assume the result holds for $k=n-1$. Starting from $i$, assume we first transition to state $s_1$. Note that we have 
\begin{align*}
\Pro (T_i \le n)=\sum\limits_{j=0}^{N} \Pro (T_i \le n|s_1=j)P(s_1=j)=\sum\limits_{j=0}^{N} \Pro (T_j \le n-1)P(s_1=j)
\end{align*}
By induction assumption, $\Pro (T_j \le n-1)=P^{(n-1)}_{j,0}$. Therefore, 
$\Pro (T_i \le n)=\sum\limits_{j=0}^{n} P_{i,j}P^{(n-1)}_{j,0}=P^{(n)}_{i,0}$

\end{proof}
Now, we are in the position to prove the first important result.
\begin{thm} [Stochastic order of $T_h$] \label{Ti}
Let $T_h$ denote the (stochastic) time to reach state $F$ starting from state $h$. Given that the transition probability $P$ has IFR property, we have that $T_h$s are (stochastically) increasing. In other words, 
$$T_1 \preceq T_2 \preceq  \ldots \preceq T_H. $$
\end{thm}

\begin{proof}
~\\
Assume $P$ is $n$ by $n$. We need to prove that $\Pro (T_i \le m) \ge \Pro (T_j \le m)$ for all $i\le j$, and $m \ge 0$. Recall that $P^{(m)}$ gives us the $m$-step transition probability. According to Lemma \ref{fundam}
\begin{align} \label{CDF}
\Pro (T_i \le m)=P^{(m)}_{i,F}.
\end{align}
It remains to prove that $P^{(m)}_{i,F}$ is decreasing in $i$. Since $P$ is IFR, by Lemma \ref{AB}, we have that $P^{(2)}=PP$ is IFR. By induction we have that $P^{(m)}=P^{(m-1)}P$ is IFR. Thus $P^{(m)}_{i,F}$ is non-increasing in $i$.
\end{proof}
Note that we don't necessarily have such property for states other than state $F$, i.e. time to reach state $h$ may be non-ordered. We have the following generalization:
\begin{cor} \label{Ti2}
Assume that the transition probability $P$ has IFR property. Define $T_h^m$ as the time to first visit to the state $m$ or worse (states $i \le m$) starting from state $h$. Then for all $m$, 
$$T_1^m \preceq T_2^m \preceq  \ldots \preceq T_H^m. $$
\end{cor}
\begin{proof}[Sketch of the proof]
Fix $m$. Based on the original Markov chain, construct a new chain, where all of the states $i,\ldots,m$ are combined into one \textit{absorbing} state, and the rest of the chain is unchanged. Then, $T_h^m$ is the time to absorption for this chain. It is easy to show that the transition matrix of this chain has IFR property. Then the result follows form Theorem \ref{Ti}.
\end{proof}
\subsection{Convexity in random walks}
In the previous section, we proved the monotonicity of random variables $T_h$. We see in Example \ref{nonunimodal} that this, on its own, is not enough to guarantee convexity of the substitution cost. Therefore, we add an extra assumption for the sequence $T_h$ that guarantees convexity of $C$. Then we show that in certain random walks, the extra condition holds.

{\color{\mycolor}
\begin{prop} \label{order_pres}
Let $A_i \preceq B_i$, and $u$ be a real valued multivariate non-decreasing function. Then $$u(A_1\ldots, A_n) \preceq u(B_1,\ldots, B_n).$$
\end{prop} 
For proof, refer to reference \cite{shaked2007stochastic}.}

\begin{lem} \label{main}
Let $Y_i$ be an increasing sequence of random variables, i.e. $$Y_1 \preceq Y_2 \preceq Y_3 \ldots $$ Moreover assume that $Y_i\sim Y_{i-1} + D_i$ and $D_i \sim D \ge 0$ for all $i$. Let $f(x)$ be a real valued convex function. Then $g(i):=\Ex [f(Y_i)]$ is a convex (discrete convex) function.
\end{lem}
\begin{proof}
~\\
We need to show that $\Delta i:=\Ex [f(Y_{i+1})- f(Y_{i})]$ is non-decreasing in $i$.\\
Fix $d \in \mathcal{R}^{\ge 0}$. Since $f$ is convex, we have $f(x+d)-f(x)$ is increasing in $x$. By Proposition \ref{order_pres}, $ f(Y_i+d)- f(Y_i)$ is increasing in $i$, i.e.
\begin{align} \label{r1}
i \le j: f(Y_i+d)- f(Y_i) \preceq  f(Y_j+d)- f(Y_j).
\end{align}
Also by Proposition \ref{order}, Equation \ref{r1} implies that $ \Ex [f(Y_i+D)- f(Y_i)]$ is increasing in $i$ for any random variable $D$, i.e.
\begin{align}
i \le j: \Ex [f(Y_i+D)- f(Y_i)] \le  \Ex [f(Y_j+D)- f(Y_j)].
\end{align}
By assumption $D_i \sim D$ for all $i$, and so we have that $ \Ex f(Y_i+D_i)=\Ex f(Y_i+D), \forall i$. Furthermore, note that 
$\Ex [f(Y_{i+1})]=\Ex [f(Y_{i}+D)]$. As a result, $\Delta i=\Ex [f(Y_{i+1})-f(Y_{i})]$ is increasing in $i$.
\end{proof}

{\color{\mycolor}
\begin{prop} \label{linear_convex_pres}
Let $f(X)$, be a multivariate convex function from $\mathcal{R}^n$ to $\mathcal{R}$. Then for any $A$, and $b$, $f(AX+b)$ is convex.
\end{prop}
For proof, refer to reference \cite{shetty1993nonlinear}.
}
\begin{cor} [Main Result] \label{mainres}
Let $T_h$ denote the (stochastic) time to reach state $F$ starting from state $h$. Assume $T_i \sim T_{i-1}+D_i$ and $D_i \sim D \ge 0$ for all $i$. If $\mathcal{C}(.)$ is a convex functions, then the substitution cost $C(h)$ is convex in $h$.
\end{cor}
\begin{proof}
~\\
By Theorem \ref{Ti}, we have that $T_h$s are (stochastically) increasing. For random variables $A_1$, and $A_2$ define $u(A_1,A_2):=A_1+A_2$, which is an increasing function. Since $T_i \preceq T_{i+1}$, and $-M \preceq -M$, by Proposition \ref{order_pres} , we have that
 \begin{align}
u(T_i,-M) \preceq u(T_{i+1},-M) \implies T_{i}-M \preceq T_{i+1}-M
\end{align}
Also, we have that $$T_i-M \sim T_{i-1}+D_i-M,$$ and $D_i-M \sim D-M$, $\forall i$.
So, all the conditions of Lemma \ref{main} for the sequence $T_i-M$ are satisfied. Define function $g(x):=\C{C}(-x)$. Note that by Proposition \ref{linear_convex_pres}, $g$ is also convex. As a result of Lemma \ref{main}, $C(h)=\Ex g(T_h-M)=\Ex [\mathcal{C}(M-T_h)]$ is convex in $h$.
\end{proof}
\begin{exmp} \label{nonunimodal}
For Lemma \ref{main}, having $Y_i$ stochastically ordered, on its own, does not guarantee uni-modality of $g(i)$, let alone convexity. For a example assume $f(x)=|x-3|$, and the following random variables $Y_1=0$, $Y_2=1$, $Y_3=2+4B$, $Y_4=3+3B'$, where $B$, and $B'$, are two independent Bernoulli random variable with $p=0.7$. It is easy to show that $Y_1<Y_2<Y_3 \preceq Y_4$. Also note that $g(1)=\Ex [f(Y_1)]=3$, $g(2)=2$, $g(3)=2.4$, $g(4)=2.1$.
\end{exmp}

In Lemma \ref{main}, we gave a sufficient condition, here we provide some necessary condition as well. This necessary condition is used to show necessary conditions under which the random walk transition induces a convex substitution cost.



\begin{prop} \label{adel}
Let $Y_i$ be an increasing sequence of random variables, i.e. $$Y_1 \preceq Y_2 \preceq Y_3 \ldots $$ 
and let $f(x)$ be a real valued convex function. Then if $g(i):=\Ex[ f(Y_i)]$ is a convex function, we have 
\begin{itemize}
\item $\Ex [Y_j] -\Ex [Y_i]=\mu(j-i), \forall i,j$ for some $\mu \ge 0$, 
\item $\Ex [Y_i-t]^+$ is convex in $i$ for any $t$.
\end{itemize}
\end{prop}
\begin{proof}
~\\
By letting $Z(i)=Y_i$ and using proposition 2, in reference \cite{stoch}, the results follow. Since $\Ex [Z(i)]$ should be linear in $i$, then $\Ex [Y_j] -\Ex [Y_i]=\mu(j-i), \forall i,j$. The second condition directly follows from proposition 2 in reference \cite{stoch}.
\end{proof}


In the Propositions \ref{rand_walk}, and \ref{conj}, we study conditions under which a random walk for state transition induces a convex terminal cost. Basically, we prove that the conditions for Corollary \ref{mainres} are met.
{\color{\mycolor}
\begin{prop} \label{IFR}
Assume for the transition matrix $P$, we have that $P_{i,j}=\phi(j-i)$ for some function $\phi: \mathcal{Z} \to \mathcal{R}^{\ge 0}$. Then matrix $P$ is IFR.
\end{prop}
For proof, refer to reference \cite{puterman1994markov}.
}
\begin{prop} \label{rand_walk}
Assume the transition matrix $P$ has the following properties:
\begin{itemize}
\item $P(F|F)=1$
\item $P(h-1|h)=p$, $P(h+1|h)=q$, and $P(h|h)=1-p-q$, $\forall H>h>F$
\item $P(H-1|H)=p$, $P(H|H)=1-p$.
\end{itemize}
Also assume $\mathcal{C}(.)$ is a convex functions. Then, substitution cost is convex, if and only if $q=0$.
\end{prop}
\begin{proof}
~\\
$\rightarrow$:\\
Note that by Proposition \ref{IFR} $P$ has IFR property. So, by Theorem \ref{Ti}, $T_i$s are (stochastically) increasing. Also, note that since $q=0$, we have that $T_i \sim T_{i-1}+\mathcal{G}(p), \forall i$, where $\mathcal{G}(p)$ denotes a geometric distribution with success probability $p$. So, all the conditions for Corollary \ref{mainres} are met. Thus, $C(h)$ is convex in $h$.\\
$\leftarrow$:\\
We show that the first condition of Proposition \ref{adel} is  met only if $q=0$.\\
Define $t_h:=\Ex [T_h]$. For $h=H$ we have that $t_H=1+pt_{H-1}+(1-p)t_H$, and thus, $t_H-t_{H-1}=\frac{1}{p}$.\\
By Proposition \ref{adel}, we need to have $t_i-t_{i-1}=\frac{1}{p}$, which is equivalent to $t_i=t_H-\frac{1}{p}(H-i)$.
Also, note that for all $0<i<H$, we have 
$$t_i=1+pt_{i-1}+qt_{i+1}+(1-p-q)t_i \implies t_i=\frac{1+pt_{i-1}+qt_{i+1}}{p+q}.$$
Now, if we plug in $t_i=t_H-\frac{1}{p}(H-i)$, we have:
\begin{align*}
t_i&=\frac{1+pt_{i-1}+qt_{i+1}}{p+q}=\frac{1}{p+q} \left[p(t_H-\frac{1}{p}(H-i+1))+q(t_H-\frac{1}{p}(H-i-1))+1\right] \\
&=\frac{1}{p+q} \left [ (p+q)(t_H-\frac{1}{p}(H-i)+\frac{q}{p}\right]=t_H-\frac{1}{p}(H-i)+\frac{q}{p(p+q)}=t_i+\frac{q}{p(p+q)} \implies q=0.
\end{align*}
\end{proof}
\begin{prop}  \label{conj}
Assume that $H=\infty$, and transition matrix $P$ has the following properties:
\begin{itemize}
\item $P(F|F)=1$
\item $P(h-1|h)=p$, $P(h+1|h)=q$, and $P(h|h)=1-p-q$, $\forall h>F$
\end{itemize}
Furthermore, assume that $q<p$, and $\mathcal{C}(.)$ is a convex functions. Then substitution cost is convex.
\end{prop}
\begin{proof}
~\\
We show that the conditions for Corollary \ref{mainres} are met. By Lemma \ref{IFR}, $P$ has IFR property, and by Theorem \ref{Ti}, $T_i$ are (stochastically) increasing. Note that by symmetry $T_j \sim T_{j-1}+T_1$. So, all the conditions for Corollary \ref{mainres} are met. Thus, $C(h)$ is convex in $h$.
\end{proof}
\section{Structural Properties} \label{monotonicity}
In this section, we show existence of monotone optimal policies for the problem defined in Section \ref{definition}, where the transitions are random walks. We use convexity of substitution cost proved in the previous section in deriving the results.

\subsection{Monotone policies under random walk}
Although for the original problem, the cost of stopping in a given state, i.e. $C(h)$, is endogenous to the process (dependant on the transition probability matrix and the penalty function), but since results hold under any convex $C(h)$, we assume a general stopping problem where $C(h)$ is exogenous, and assume it is given.

\begin{thm} \label{main_res_2}Assume the transition probability given in Proposition \ref{rand_walk}, and $H \le \infty$. Also, assume that $q<p$ and cost of stopping $C(i)$ is convex. Then the optimal policy and value function is monotone.
\end{thm}

\begin{proof} 
~\\
Define $\mathcal{S}:=\argmin\{C(h)\}$, $\mathcal{S}_1:=\{ h < \min(\mathcal{S})\}$, and $\mathcal{S}_2:=\{h > \max(\mathcal{S})\}$. We prove that the optimal policy is as follows: if $h \in \mathcal{S}_1$, we have $\pi^*(h)=W$, otherwise $\pi^*(h)=O$, i.e. the threshold $\hbar=\min(\mathcal{S})$.\\

For states belonging to $\mathcal{S}$, it is trivial that action $O$ is optimal, and thus $V^*(\mathcal{S})=\min\{C(h)\}$. For states $h \in \mathcal{S}_1$, if take action $W$, we will eventually end-up in a state in $\mathcal{S}$. So, for those states the action $W$ is optimal, and by the same logic $V^*(h\in \mathcal{S}_1)=\min\{C(h)\}$.

It remains to prove $\pi^*(h)=O$ for states $h \in \mathcal{S}_2$. The proof is based on the policy improvement algorithm. Assume we start by the given policy. We will prove that the algorithm cannot improve upon the given policy. Fix a state $i \in \mathcal{S}_2$. Note that
 $V_0(j \in \mathcal{S}_2 \cup \mathcal{S})=C(j)$. We have 
$$\pi_1(i)=\argmin\{C(i),pC(i+1)+qC(i-1)+(1-p-q)C(i)\}$$
It remains to show that $pC(i+1)+qC(i-1)+(1-p-q)C(i) \ge C(i)$. Note that

\begin{align} \label{i2}
pC(i+1)+qC(i-1)+(1-p-q)C(i) \ge C(i) \Leftrightarrow C(i) \le p' C(i+1)+q' C(i-1)
\end{align}
where $q':=\frac{q}{q+p}$, and $p':=\frac{p}{q+p}$. 

Define $\Delta_1:=C(i)-C(i-1)$, and $\Delta_2:=C(i+1)-C(i)$. By simple algebra, we can show that Equation \ref{i2} is equivalent to $p'\Delta_2-q'\Delta_1 \ge 0$. By convexity of $C$, we have $\Delta_1 \le\Delta_2$. Since $p'>q'$, we have $p\Delta_2-q\Delta_1 \ge 0$. 
\end{proof}

\begin{note} The result of Proposition \ref{main_res_2} holds under the following more generalized transition matrix $P$:
\begin{itemize}
\item $P(F|F)=1$
\item $P(h-1|h)=p_h$, $P(h+1|h)=q_h$, and $P(h|h)=1-p_h-q_h$, $\forall H>h>F$
\item $P(H-1|H)=p_H$, $P(H|H)=1-p_H$.
\end{itemize}
where $q_h<p_h$.
\end{note}
\begin{prop} Assume the same conditions for Theorem \ref{main_res_2}, but $H=\infty$, and $q>p$. Given that $f_E$ is strictly increasing, the optimal policy is globally $W$ for $h \ne F$. Also, $V^*(h)$ is monotone.
\end{prop}
\begin{proof}
~\\
Define $a(h)$ as the probability of absorption starting from state $h$. Note that by symmetry we have $$a(i)=pa(i-1)+qa(1)a(i)\implies a(i)=\frac{pa(i-1)}{1-qa(1)}.$$ 
If $q>p$, we have $a(1) < 1$, which itself implies $a(i)$ is strictly decreasing. Since $a(i)<a(1)<1$, we have $P(T_i=\infty)>0$. Since $f_E$ is strictly increasing, we have $C(h)=\infty$. As a consequence, it is optimal to take $W$ whenever possible. Doing so, if we get absorbed we incur $C(F)$, otherwise we incur zero cost. Thus, $V^*(h)=a(h)C(F)$. Also note that since $a(h)$ is decreasing in $h$, we have that $V^*(h)$ is monotone decreasing.
\end{proof}

\subsection{Impact of inputs on the threshold}
In the previous section, we discussed that for a random walks with drift, the optimal policy is of threshold limit policy. Here, we see how model inputs impact the threshold. Specifically, we prove that with a (stochastically) larger lead time, the threshold becomes higher, i.e. we start ordering the substitute in a higher state. Also, the same phenomenon is proved when the cost of lateness is relatively higher.

\begin{thm} Assume the same conditions for Theorem \ref{main_res_2}, but assume the cost function has the following structure:
$\mathcal{C}(\delta)=\alpha f_L(\delta)+f_E(-\delta)$, where the parameter $\alpha \ge 0$ represents the relative weight of the lateness to earliness cost. Then for $\alpha_1 \le \alpha_2$, if $\pi^*_1(h)=O$, then $\pi^*_2(h)=O$.
\end{thm}

\begin{proof}
~\\
Theorem \ref{main_res_2} implies that for states that $\Delta_i=C(i)-C(i) \le 0$, the optimal action is $O$. We prove that $\Delta^2_i \leq \Delta^1_i$ and the result follows immediately.

Define $E_i:=\Ex[ f_E(T_i-M)]$, and $L_i:=\Ex[ f_L(M-T_i)]$ as the expected earliness and lateness cost for state $i$. Since $T_{i-1} \preceq T_{i} $, and $f_L$ is a non-decreasing function, according to Proposition \ref{order}, $L_i$ is non-increasing.\\

$\Delta^2_i -\Delta^1_i=[E_i+\alpha_2 L_i-(E_{i-1}+\alpha_2 L_{i-1}) ]- [E_i+\alpha_1 L_i-(E_{i-1}+\alpha_1 L_{i-1}) ] =(\alpha_2 - \alpha_1)(L_i-L_{i-1}) \le 0.$
\end{proof}

{
\color{\mycolor}
\begin{prop}[Coupling or sample-space construction] \label{coupling}
We have $X_1 \preceq X_2$ if and only if there exists random variables $\mathcal{X}_i \sim X_i$, such that $\mathcal{X}_1(\omega) \le \mathcal{X}_2 (\omega)$ in the constructed sample space $\Omega$. We call $\C{X}_i$ the shadow copy of $X_i$. Furthermore, we have $\Ex [f(X_i)]=\Ex [f(\mathcal{X}_i)]$ for any integrable function $f$.
\end{prop}
}
\begin{lem} \label{shift}
Assume a convex function $f$, and $X_1 \preceq X_2$, and $M_1 \preceq M_2$. Then $$\Ex [f(X_2+M_1)]-\Ex [f(X_1+M_1)] \leq \Ex [f(X_2+M_2)]-\Ex [f(X_1+M_2)].$$
\end{lem}
\begin{proof}
~\\
Using coupling (Proposition $\ref{coupling}$), we can find shadow copies of $X_i$, say $\C{X}_i$, such that $\C{X}_2 \sim \C{X}_1+\Delta_X$, and  $\Delta_X \ge 0$. Note that since $f$ is a convex function, $f(\C{X}_1+\Delta_X+m)-f(\C{X}_1+m)$ is (stochastically) increasing in $m$, i.e.
\begin{align}
m_1 \le m_2: f(\C{X}_1+\Delta_X+m_1)-f(\C{X}_1+m_1) \preceq f(\C{X}_1+\Delta_X+m_2)-f(\C{X}_1+m_2).
\end{align}
Thus, by Proposition \ref{order_pres} we have
\begin{align*}
&f(\C{X}_1+\Delta_X+M_1)-f(\C{X}_1+M_1) \preceq f(\C{X}_1+\Delta_X+M_2)-f(\C{X}_1+M_2)
\end{align*}
Also, by Proposition \ref{order}, we have
\begin{align*}
\Ex [f(\C{X}_1+\Delta_X+M_1)]-\Ex [f(\C{X}_1+M_1)] \leq \Ex [f(\C{X}_1+\Delta_X+M_2)]-\Ex [f(\C{X}_1+M_2)]
\end{align*}
Finally using the fact that $\Ex [f(X_i)]=\Ex [f(\mathcal{X}_i)]$, the result follows.
\end{proof}
\begin{thm}
Assume the same conditions for Theorem \ref{main_res_2}, and two random ordering lead time $M_1$, and $M_2$, such that $M_1 \preceq M_2$. If $\pi^*_1(h)=O$, then $\pi^*_2(h)=O$.
\end{thm}
\begin{proof}
~\\
We prove this theorem by showing $\Delta^2_i \leq \Delta^1_i$ (defined in the previous theorem). By definition we have:
\begin{align*}
\Delta^1_i =\Ex [\C{C}(M_1-T_i)]-\Ex [\C{C}(M_1-T_{i-1})], \Delta^2_i=\Ex [\C{C}(M_2-T_i)]-\Ex [\C{C}(M_2-T_{i-1})].
\end{align*}
Define $X_1:=-T_{i}$, and $X_2:=-T_{i-1}$. According to Lemma \ref{shift}, $-\Delta^1_i \leq - \Delta^2_i$, which gives $\Delta^2_i \leq \Delta^1_i$.
\end{proof}




\section{Computational results}\label{comp}
In the previous sections, we only studied random walk with drift for transitions. In this section, we report some computational results under a general transition probability. Here are the baseline values for model parameters:
\begin{itemize}
\item The penalty cost: $\mathcal{C}(\delta)=\alpha [\delta]^+ + [-\delta]^+$, $\alpha=5$.
\item The number of states $H+1$: 8
\item Transition probability: $P_{i,j}=\phi(j-i)$ for a function $\phi: \mathcal{Z} \to \mathcal{R}^{\ge 0}$.
\item Maturation time: discrete uniform 3 to 9 periods, with steps of length $\dfrac{1}{4}$ periods.
\end{itemize}
Note that by Proposition \ref{IFR} the transition probability is always IFR for all functions $\phi$. To find the optimal solution, I coded the Gauss-Seidel algorithm in MATLAB (provided in Appendix). We observed the followings:
\begin{obs} If the transition matrix is lower-triangular, the substitution cost is uni-modal, but the value function may be non-unimodal. See the following example. If the lower-triangularity of $P$ is removed, both of terminal cost and value function can be non-unimodal.
\end{obs}
\begin{exmp} \label{figp}
Assume the transition matrix $P$ generated by $\phi(0)=.95, \phi(2)=.01, \phi(2)=.04$. The value function and the terminal cost is given in figure 1.
\end{exmp}
\begin{figure}[htbp]
\centering
\includegraphics[scale=0.65]{./Matlab/plot}
\caption{Cost and value function for the transition matrix given in Example \ref{figp}}
\label{fig:plot}
\end{figure}
\begin{obs} Under all functions $\phi$, all distributions for maturation time (over the 3-9 period domain), and all values of $\alpha$ the policy was of CLT type.
\end{obs}
Based on the observations, I suggest the following conjecture:
\begin{conj}
Given the following assumptions, the optimal policy is of the CLT type:
\begin{itemize}
\item Matrix $P$ is IFR.
\item Penalty cost $\C{C}$ is convex.
\end{itemize}
\end{conj}
\section{Conclusions}
In this paper, we studied the optimal timing of AVF creation for patient with chronic kidney disease. We modelled the problem as a Markov decision process, drew some analogy to the machine replacement literature, and found some analytical results for the optimal policy. We assumed a random walk with drift for state transition. Although this seems to be a restriction, but depending on how you define states and the decision epochs, it can quite make sense. For example, consider the AVF creation problem; we can define the state according to eGFR levels as follows: state $F:=[0,15)$, $S_1:=[15,20)$, and so forth. It is very rare that in a month a patient's eGFR level drops more than 5 units. Thus a random walk transition makes sense in this setting.

A number of assumptions pose limitations to our model: we assume all AVF creation surgeries are successful, and AVF, when created, would work forever. Also, we did not model the death of patients. In this model, we assumed that the disease progression is known a priori. To address this, we can incorporate learning methods to our decision model as follows: we can use existing history of  patients to form the prior distribution for the disease progression parameters. Then, using observations we periodically take from the patient, we can apply Bayesian methods to update our estimation of the patient's disease progression parameters. Doing so, the decision we make is more patient specific, and result in better patient outcomes.

\newpage
\section{Appendix} \label{appendix2}
MATLAB code for the computational results part.
\lstinputlisting[language=Matlab]{./Matlab/alll.m}

\newpage
\bibliography{references}

\end{document}
